Objective: To identify differentially expressed genes, you must first conduct quality control assessments of your experiments. Next, perform statistical diagnostic analyses to detect and correct any experimental biases.
The experiments were performed comparing one cell culture incubated at atmospheric oxygen conditions (call normoxic and labelled using Green dye) and another one incubated in 1% O2 (call hypoxic and labelled using Red dye).
library(marray)
The R package Marray offers several functions to:
Input : A GPR file with detailed information for each spot on the slide (gene name, Red and Green intensity values, background intensities and other statistics).
# Read the GPR file using the marray package function read.GenePix
rawdata <- read.GenePix(fnames="dataFile1_normAnalysis.gpr",
path= "/shared/projects/2420_ens_hts/data/microarrays")
## Reading ... /shared/projects/2420_ens_hts/data/microarrays/dataFile1_normAnalysis.gpr
Note : This function reads a GPR file and creates objects of class marrayRaw. In these objects, you can find, for instance, vectors with intensity values (rawdata@maRf or rawdata@maGf). These vectors can be manipulated using classical R functions like “summary()”, “hist()”, etc.
# f is for foreground
# Intensity values in red/hypoxic channel
head(rawdata@maRf)
## /shared/projects/2420_ens_hts/data/microarrays/dataFile1_normAnalysis.gpr
## [1,] 992
## [2,] 1907
## [3,] 559
## [4,] 645
## [5,] 32939
## [6,] 681
# Intensity values in green/normoxic channel
head(rawdata@maGf)
## /shared/projects/2420_ens_hts/data/microarrays/dataFile1_normAnalysis.gpr
## [1,] 2561
## [2,] 2585
## [3,] 1588
## [4,] 1604
## [5,] 43755
## [6,] 1732
Visualisation of foreground signal over the microarray glass slide
# Red/hypoxic signals
image(rawdata,
xvar = "maRf",
main = "Hypoxic signal (with flags)")
## [1] FALSE
## NULL
# Green/normoxic signals
image(rawdata,
xvar = "maGf",
main = "Normoxic signal (with flags)")
## [1] FALSE
## NULL
Visualize background signals in Red/Hypoxic and Green/Normoxic channels. Try to interpret the obtained results. What do you think about the quality of the experiment?
# b is for background
# Red/Hypoxic channel background signals
image(rawdata,
xvar = "maRb",
main = "Hypoxic background (with flags)")
## [1] FALSE
## NULL
# Green/Normoxic channel background signals
image(rawdata,
xvar = "maGb",
main = "Normoxic background (with flags)")
## [1] FALSE
## NULL
Low quality spots will be eliminated from further analyses. Intensity values in foreground and background signals will be replaced with the R symbol “NA” (Not Available).
# You work on a copy of the rawdata
rawdataQC <- rawdata
#Background values to NA
rawdataQC@maRb[rawdataQC@maW < 0] = NA
rawdataQC@maGb[rawdataQC@maW < 0] = NA
#Signal values to NA
rawdataQC@maRf[rawdataQC@maW < 0] = NA
rawdataQC@maGf[rawdataQC@maW < 0] = NA
An intuitive approach for background correction consists in subtracting background intensity values (rawdata@maRb and rawdata@maGb) from global foreground intensities (rawdata@maRf and rawdata@maGf). Nevertheless this method is debatable because for low intensities it adds more noise to the data and creates overestimated log(Ratio) values.
Draw a MA plot with background subtracted from foreground values
plot(rawdataQC,legend.func = NULL, ylim=c(-9,9), xlim=c(3,16), main = "MA plot before normalization (background subtraction)")
Draw a MA plot with foreground values only
# You work on a copy of the rawdata
rawdataQCnoBg <- rawdataQC
#Replace all background by 0 as by default background is subtracted from foreground
rawdataQCnoBg@maGb[] = 0
rawdataQCnoBg@maRb[] = 0
plot(rawdataQCnoBg,legend.func = NULL, ylim=c(-9,9), xlim=c(3,16), main = "MA plot before normalization (no background subtraction)")
The following analyses will be performed with no background subtraction.
plot(rawdataQCnoBg, main = "MA plot before normalization")
boxplot(rawdataQCnoBg, main = "Boxplot before normalization")
rawdataQCnoBgMedian <- maNorm(rawdataQCnoBg, norm = "median", echo = T)
## Normalization method: median.
## Normalizing array 1.
rawdataQCnoBgLoess <- maNorm(rawdataQCnoBg, norm = "loess", echo = T)
## Normalization method: loess.
## Normalizing array 1.
MA plots after normalization
plot(rawdataQCnoBgMedian, legend.func = NULL, main = "norm = Median")
plot(rawdataQCnoBgLoess, legend.func = NULL, main = "norm = Loess")
Boxplots after normalization
boxplot(rawdataQCnoBgMedian, main = "norm = Median")
boxplot(rawdataQCnoBgLoess, main = "norm = Loess")
plot(density(maM(rawdataQCnoBgLoess),na.rm = T),
lwd = 2, col = 2, main = "Distribution of log(Ratio)")
lines(density(maM(rawdataQCnoBg), na.rm = T), lwd = 2)
abline(v = 0)
legend(x= 0.5, y= 1.2,c("Before normalization","After normalization with loess"), fill = c(1,2))
#Date
format(Sys.time(), "%d %B, %Y, %H,%M")
## [1] "21 August, 2024, 13,56"
#Packages used
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-conda-linux-gnu
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.4.1/lib/libopenblasp-r0.3.27.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Paris
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] marray_1.82.0 limma_3.60.3
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.36 R6_2.5.1 fastmap_1.2.0 xfun_0.45
## [5] cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.27
## [9] lifecycle_1.0.4 cli_3.6.3 sass_0.4.9 jquerylib_0.1.4
## [13] statmod_1.5.0 compiler_4.4.1 highr_0.11 rstudioapi_0.16.0
## [17] tools_4.4.1 evaluate_0.24.0 bslib_0.7.0 yaml_2.3.9
## [21] rlang_1.1.4 jsonlite_1.8.8